Pesquisa | Portal Regional da BVS

1.

Editorial: Data science and artificial intelligence for (better) science.

Burgelman, Jean-Claude; Wang, Kuansan.

Front Res Metr Anal ; 8: 1177903, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37168435

2.

Public use and public funding of science.

Yin, Yian; Dong, Yuxiao; Wang, Kuansan; Wang, Dashun; Jones, Benjamin F.

Nat Hum Behav ; 6(10): 1344-1350, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35798885

RESUMO

Knowledge of how science is consumed in public domains is essential for understanding the role of science in human society. Here we examine public use and public funding of science by linking tens of millions of scientific publications from all scientific fields to their upstream funding support and downstream public uses across three public domains-government documents, news media and marketplace invention. We find that different public domains draw from various scientific fields in specialized ways, showing diverse patterns of use. Yet, amidst these differences, we find two important forms of alignment. First, we find universal alignment between what the public consumes and what is highly impactful within science. Second, a field's public funding is strikingly aligned with the field's collective public use. Overall, public uses of science present a rich landscape of specialized consumption, yet, collectively, science and society interface with remarkable alignment between scientific use, public use and funding.

Assuntos

Conhecimento , Meios de Comunicação de Massa , Humanos

3.

CORD-19: The Covid-19 Open Research Dataset.

Wang, Lucy Lu; Lo, Kyle; Chandrasekhar, Yoganand; Reas, Russell; Yang, Jiangjiang; Burdick, Douglas; Eide, Darrin; Funk, Kathryn; Katsis, Yannis; Kinney, Rodney; Li, Yunyao; Liu, Ziyang; Merrill, William; Mooney, Paul; Murdick, Dewey; Rishi, Devvret; Sheehan, Jerry; Shen, Zhihong; Stilson, Brandon; Wade, Alex D; Wang, Kuansan; Wang, Nancy Xin Ru; Wilhelm, Chris; Xie, Boya; Raymond, Douglas; Weld, Daniel S; Etzioni, Oren; Kohlmeier, Sebastian.

ArXiv ; 2020 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-32510522

RESUMO

The Covid-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on Covid-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many Covid-19 text mining and discovery systems. In this article, we describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and describe several shared tasks built around the dataset. We hope this resource will continue to bring together the computing community, biomedical experts, and policy makers in the search for effective treatments and management policies for Covid-19.

4.

Editorial: Innovations and Perspectives in Data Mining and Knowledge Discovery.

Abe, Naoki; Liu, Huan; Wang, Kuansan.

Front Big Data ; 3: 637906, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33693428

5.

Mitigating Biases in CORD-19 for Analyzing COVID-19 Literature.

Kanakia, Anshul; Wang, Kuansan; Dong, Yuxiao; Xie, Boya; Lo, Kyle; Shen, Zhihong; Wang, Lucy Lu; Huang, Chiyuan; Eide, Darrin; Kohlmeier, Sebastian; Wu, Chieh-Han.

Front Res Metr Anal ; 5: 596624, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33870059

RESUMO

On the behest of the Office of Science and Technology Policy in the White House, six institutions, including ours, have created an open research dataset called COVID-19 Research Dataset (CORD-19) to facilitate the development of question-answering systems that can assist researchers in finding relevant research on COVID-19. As of May 27, 2020, CORD-19 includes more than 100,000 open access publications from major publishers and PubMed as well as preprint articles deposited into medRxiv, bioRxiv, and arXiv. Recent years, however, have also seen question-answering and other machine learning systems exhibit harmful behaviors to humans due to biases in the training data. It is imperative and only ethical for modern scientists to be vigilant in inspecting and be prepared to mitigate the potential biases when working with any datasets. This article describes a framework to examine biases in scientific document collections like CORD-19 by comparing their properties with those derived from the citation behaviors of the entire scientific community. In total, three expanded sets are created for the analyses: 1) the enclosure set CORD-19E composed of CORD-19 articles and their references and citations, mirroring the methodology used in the renowned "A Century of Physics" analysis; 2) the full closure graph CORD-19C that recursively includes references starting with CORD-19; and 3) the inflection closure CORD-19I, that is, a much smaller subset of CORD-19C but already appropriate for statistical analysis based on the theory of the scale-free nature of the citation network. Taken together, all these expanded datasets show much smoother trends when used to analyze global COVID-19 research. The results suggest that while CORD-19 exhibits a strong tilt toward recent and topically focused articles, the knowledge being explored to attack the pandemic encompasses a much longer time span and is very interdisciplinary. A question-answering system with such expanded scope of knowledge may perform better in understanding the literature and answering related questions. However, while CORD-19 appears to have topical coverage biases compared to the expanded sets, the collaboration patterns, especially in terms of team sizes and geographical distributions, are captured very well already in CORD-19 as the raw statistics and trends agree with those from larger datasets.

6.

Opportunities in Open Science With AI.

Wang, Kuansan.

Front Big Data ; 2: 26, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-33693349

RESUMO

Bolstered by ever affordable computational power and open big datasets, artificial intelligence (AI) technologies are bringing revolutionary changes to our lives. This article examines the current trends and elaborates the future potentials of AI in its role for making science more open and accessible. Based on the experience derived from a research project called Microsoft Academic, the advocates have reasons to be optimistic about the future of open science as the advanced discovery, ranking, and distribution technologies enabled by AI are offering strong incentives for scientists, funders and research managers to make research articles, data and software freely available and accessible.

7.

A Review of Microsoft Academic Services for Science of Science Studies.

Wang, Kuansan; Shen, Zhihong; Huang, Chiyuan; Wu, Chieh-Han; Eide, Darrin; Dong, Yuxiao; Qian, Junjie; Kanakia, Anshul; Chen, Alvin; Rogahn, Richard.

Front Big Data ; 2: 45, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-33693368

RESUMO

Since the relaunch of Microsoft Academic Services (MAS) 4 years ago, scholarly communications have undergone dramatic changes: more ideas are being exchanged online, more authors are sharing their data, and more software tools used to make discoveries and reproduce the results are being distributed openly. The sheer amount of information available is overwhelming for individual humans to keep up and digest. In the meantime, artificial intelligence (AI) technologies have made great strides and the cost of computing has plummeted to the extent that it has become practical to employ intelligent agents to comprehensively collect and analyze scholarly communications. MAS is one such effort and this paper describes its recent progresses since the last disclosure. As there are plenty of independent studies affirming the effectiveness of MAS, this paper focuses on the use of three key AI technologies that underlies its prowess in capturing scholarly communications with adequate quality and broad coverage: (1) natural language understanding in extracting factoids from individual articles at the web scale, (2) knowledge assisted inference and reasoning in assembling the factoids into a knowledge graph, and (3) a reinforcement learning approach to assessing scholarly importance for entities participating in scholarly communications, called the saliency, that serves both as an analytic and a predictive metric in MAS. These elements enhance the capabilities of MAS in supporting the studies of science of science based on the GOTO principle, i.e., good and open data with transparent and objective methodologies. The current direction of development and how to access the regularly updated data and tools from MAS, including the knowledge graph, a REST API and a website, are also described.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA